CLAIMS 



What is claimed is: 

A A method for mining a document containing dirty text comprising: 
removing an instance of dirty text within said document to produce 

a cleaned Document; and 

performing a data mining operation on said cleaned document. 

2. T\he method for mining a document containing dirty text as recited 
in Claim 1 , wherein said removing further comprises replacing an instance of 
dirty text with a Standard term. 

3. The\method for mining a document containing dirty text as recited 
in Claim 1, whereVi said removing further comprises removing an instance of 
computer code from said document. 

4. The method for mining a document containing dirty text as recited 
in Claim 1 , wherein sa\d removing further comprises removing a table from said 
document. 

5 The methodfor mining a document containing dirty text as recited 
in Claim 1 , wherein said performing a data mining operation further comprises 
identifying a sentence with\n said cleaned document by identifying a beginning 
and an end of said sentence 

6. The method for Viining a document containing dirty text as recited 
in Claim 5, wherein said performing a data mining operation further comprises 
scoring and ranking said sentence. 
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The method for mining a document containing dirty text as recited 
in Clairn\6, wherein scoring said sentence further comprises: 

selecting scoring techniques operable for summarizing non- 
narrative, grammatically incorrect text; 

selecting scoring techniques operable for summarizing narrative, 

grammatically correct text; and 

Rising said scoring techniques to score said sentence. 

8. Trie method for mining a document containing dirty text as recited 
in Claim 7, wherein said method further comprises generating a summary 
derived from said scored and ranked sentences. 

9. The mtethod for mining a document containing dirty text as recited 
in Claim 1 , wherein skid method further comprises selecting a text mining 
component based upon said data mining operation to be performed. 

10. The method for mining a document containing dirty text as recited 
in Claim 1 , wherein saidVnethod further comprises customizing said method by 
adjusting a parameter varue. 



11. A computer system comprising: 
a bus; 

a memory unit coupled to said bus; and 

a processor couplfed to said bus, said processor for executing a 
method for mining a document containing dirty text comprising: 

removing an instance, of dirty text within said document to produce 
a cleaned document; and 
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performing a data mining operation on said cleaned document. 

12. \ The computer system as recited in Claim 1 1 , wherein said 
removing further comprises replacing an instance of dirty text with a standard 
term. 

1 3. TheVcomputeiisystem as recited in Claim 1 1 , wherein said 
removing further dbmprise4^&rhoving an instance of computer code from said 
document. 

1 4. The computer ^wstem as recited in Claim 1 1 , wherein said 
removing further composes Qimoving a table from said document. 

15. The computer system as recited in Claim 1 1 , wherein said 
performing a data mining\operation further comprises identifying a sentence 
within said cleaned document by identifying a beginning and an end of said 
sentence. 

16. The computer system as recited in Claim 15, wherein said 
performing a data mining operation further comprises scoring and ranking said 
sentence. 



17. The computer system as recited in Claim 16, wherein scoring said 

sentence further comprises: 

selecting scoring techniques operable for summarizing non- 
narrative, grammatically incorrect text; 

selecting scoring techniques operable for summarizing narrative, 

grammatically correct text; and 
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\ using said scoring techniques to score said sentence. 

18. \ The computer system as recited in Claim 17, wherein said method 
further comprises generating a summary derived from said scored and ranked 
sentences. \ 

1 9. The computer system as recited in Claim 1 1 , wherein said method 
further comprise^ selecting a text mining component based upon said data 
mining operation to be performed. 

20. The computer system as recited in Claim 1 1 , wherein said method 
further comprises customizing said method by adjusting a parameter value. 

21 . A computer-usable medium having computer-readable program 
code embodied therein tor causing a computer system to perform the steps of: 

removing aVi instance of dirty text within said document to produce 

a cleaned document; and\ 

performing aViata mining operation on said cleaned document. 

22. The computer-fijsable medium of Claim 21 , wherein said removing 
further comprises replacing aft instance of dirty text with a standard term. 

23. The computer-usable medium recited in Claim 21 , wherein said 
removing further comprises removing an instance of computer code from said 
document. \ 

24. The computer-usable Wdiurh recited in Claim 21, wherein said 
removing further comprises removinaa table from said document. 
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2V The computer-usable medium recited in Claim 21 , wherein said 
performing a data mining operation further comprises identifying a sentence 
within saidNcleaned document by identifying a beginning and an end of said 
sentence. 

26. Tine computer-usable medium recited in Claim 25, wherein said 
performing a data mining operation further comprises scoring and ranking said 
sentence. 

15 27. The coYnputer-usable medium recited in Claim 26, wherein 

scoring said sentence further comprises: 

selectingAscoring techniques operable for summarizing non- 
narrative, grammatically incorrect text; 

selecting s&oring techniques operable for summarizing narrative, 
20 grammatically correct texfl and 

using said scoring techniques to score said sentence. 

28. The computer-usable medium recited in Claim 27, wherein said 
method further comprises generating a summary derived from said scored and 

25 ranked sentences. 

29. The computer-usable medium as recited in Claim 21 , wherein 
said method further comprises selecting a text mining component based upon 
said data mining operation to be performed. 



30. The computer-usable medium as recited in Claim 21, wherein 
said method further comprises customing said method by adjusting a 
parameter value. 
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